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EXAMPLES OF CODING SCHEMES 
BASED ON STRATEGIC VALUE 


By W. W. Happ 

Electronics Research Center 
SUMMARY 

Procedures are developed to construct variable-length 
codes for alphabets containing messages with assigned weighted- 
information content. The resulting coding procedures are similar 
in form to the Shannon-Fano-Huf fman redundancy reduction schemes, 
but differ in scope; the former maximize the strategic value, 
while the latter maximize the number of letters for a specified 
finite channel capacity. Criteria specifying the strategic value 
of a code are defined, and codes are examined on the basis of 
alphabets containing messages with weighted-information content. 
Figures-of -merit are established and utilized to compare different 
types and possible trade-offs for simple codes. Algorithms 
governing these codes are established and examined on the basis 
of representative examples. 


INTRODUCTION 

The f igure-of-merit of a coding scheme measures its effec- 
tiveness to meet specific criteria in transmitting information. 
Table I lists f igures-of-merit and objectives of typical binary 
alphabets. An extensive review and evaluation of the pertinent 
literature are given in the bibliography. 

A strategy can be defined as a management plan to execute 
effectively probable operations. This plan consists of a set of 
guidelines for formulating specific objectives which must be met 
or approached by a particular operation. The operation to be 
examined is the coding procedure. A strategy can be specified 
by a single figure-of -merit , such as accuracy, error rate, and 
compaction ratio. In general, strategy is based on: 

(1) Two or more f igures-of-merit describing the 
effectiveness of the operation; 

(2) Calculated or estimated trade-offs between 
f igures-of-merit in terms of the parameters 
of the system; for example, an algorithm 
relating the average length of a letter to 
statistical properties of the assumed 
distribution ; 



(3) Guidelines to control parameters, with an aim to 

optimize partially conflicting trade-offs by assign- 
ing weights to each set of trade-off parameters. 


TABLE I 

REPRESENTATIVE CODING SCHEMES 


Coding Scheme 

Figure -of -Merit 

Ob j ectives 

Uniform k-digit length 

Binary 2^ letter 
alphabet 

Accuracy of 1 in 
2 k 

Variable length 
(Huffman) 

Average digits 
per letter 

Bandwidth 

Orthogonal (Hamming) 

Index of comma 

Error elimination 

code 

freedom 



Coding schemes, such as the Shannon-Fano-Huf fman code, strive 
towards economy; other codes minimize error rate. 

From an "operations research" point-of-view, it is not 
purposeful to ask "Is Huffman coding better than binary coding?", 
but the question is rather "What strategy is required?". Once 
the strategy is specified, objectives and f igures-of-merit can 
be specified. Only then is it possible to determine if existing 
coding schemes, such as Huffman coding, serve a specified 
purpose . 

To illustrate a strategy with competing f igures-of-merit , 
two examples are used. First, a set of coded messages from an 
observer to a command post is assumed and analyzed. Secondly, 
run-length distributions of clustered events are examined. 
Following these elementary examples, algorithms governing 
strategic coding are established. The application of game theory, 
dynamic programming, and other "or" optimization techniques to 
coding strategy is to be explored at a later date. 

EXAMPLE 1: TWO-PARAMETER STRATEGY 

Table II lists the input data for optimization of message 
transmission from an observer to a command post. The probability 
of occurrence of a message P (R) is given as a percentage of 100% 
of transmitted data. On the other hand, not all letters carry 
the same information content. Respective weights P (A) can be 
assigned to each letter, such that P (A) measures the information 
content or strategic value associated with each letter. 
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The term "information content" is intended to define the 
relative importance placed on transmitting this letter as com- 
pared to others. The term "strategic value" describes the same 
concept and is therefore used interchangeably by "information 
content" ; both terms are denoted by P (A) . Similarly the term 
"letter" is used interchangeably with "strategic event". 

Two codes are developed in Table II on the basis of differ- 
ent criteria of effectiveness. The Redundancy Code employs the 
Shannon-Fano-Huf fman technique to maximize the average information 
transfer. Assuming inadequate channel capacity, the Minimum Re- 
dundancy Code generates unmanageably large backlogs at the very 
time when data of high strategic value place a premium on avail- 
able channel capacity. 


TABLE II 

TWO-PARAMETER STRATEGY: INPUT DATA AND CODES 


Letter 

Interpretation 

P (R) 

% 

P (A) 

% 

M in imum 
Redundancy 

Maximum 

Access 

A 

Alert A 

1 

25 

111111 

11 

B 

Alert B 

2 

15 

111110 

10 

c 

Target C 

3 

11 

11110 

0111 

D 

Target D 

3 

11 

11101 

0110 

E 

Target E 

3 

11 

11100 

0101 

F 

Damage F 

5 

3 

1101 

01001 

G 

Damage G 

5 

3 

1100 

01000 

H 

Damage H 

5 

3 

1011 

00111 

K 

Damage K 

5 

3 

1010 

00110 

L 

Damage L 

5 

3 

1001 

00101 

M 

Damage M 

5 

3 

1000 

00100 

N 

Damage N 

5 

3 

0111 

00011 

P 

Damage P 

5 

3 

0110 

00010 

Q 

Weather Q 

16 

1 

010 

000011 

R 

Weather R 

16 

1 

001 

000010 

S 

Weather S 

16 

1 

000 

000001 


Several alternatives exist: 

(1) Transfer data from "peak load" periods to quieter times. 
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(2) Assign a priority to processing of data by allocating 
a weight P (A) , referred to as strategic value, to 
each letter of the alphabet. 

(3) Develop a code which permits access of a larger number 
of letters when information of high information con- 
tent is transmitted, and which permits lower access 
rates when letters of low information content are 
transmitted. 

Alternatives (1) and (2) above require additional knowledge 
of the performance characteristics of the memory devices. Alter- 
native (3) proposes a solution in terms of coding schemes which 
can be specified in terms of suitable f igures-of-merit . 

COMPARISON OF ALTERNATIVE STRATEGIES 

It will be assumed that the code to be developed is to be 
effective under the following conditions: 

(1) The system is fed data at a rate that exceeds 
maximum channel capacity, and data compression 

or elimination is required. This is a reasonable 
assumption, since with sufficient channel capacity 
no codes are needed. 

(2) One letter only (not two or zero) can be transmitted 
at one time. The system is in one of several 
possible states. 

(3) The system remains in one state for a period com- 
mensurate with the average length of two letters, in 
accordance with the minimum requirements of the 
Nyquist sampling theorem. 

(4) More letters with high information content can be 
processed than letters with low information content. 
Access to information transmission is facilitated 
for high information content. 

Codes, so defined, provide maximum access to available 
information and may be referred to as Access Generating Codes, 
as opposed to the Redundancy Eliminating Codes developed by 
Shannon, Fano, and Huffman. Under the above assumptions, for 
instance, "Alert" messages would be provided three times the 
bandwidth allocated "Weather" data. On the other hand, in 
Redundancy Eliminating Codes, strategic data are discriminated 
against because they occur only rarely. 

Input data in Table II are condensed in Table III, with 
L (R) and L (A) being the number of digits for each group of letters. 
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In Table IV, figures -of -merit are then composed for redundancy 
coding and access coding. A trade-off must be effected between 
maximizing transmission rates of letters and of information 
transfer. 


TABLE III 

COMPARISON OF ALTERNATIVE STRATEGIES 



N 

P (R) 

P (A) 

L (R) 

L(A) 

Alert 

i 

.01 

.25 

6 

2 

Alert 

i 

. 02 

.15 

6 

2 

Targets 

3 

.03 

.11 

5 

4 

Damage 

8 

. 05 

.03 

4 

5 

Weather 

3 

.16 

.01 

3 

6 


TABLE IV 

TWO-PARAMETER STRATEGY 



Minimum 

Redundancy 

Code 

Maximum 

Access 

Code 

Uniform 

Binary 

Average number of 
letters per 100 bits 

27 

19 

25 

Information content 




per 100 bits 

20 

33 

25 


EXAMPLE 2 : RUN-LENGTHS WITH CLUSTERS 

In many applications requiring readout of coding data, the 
octal notation has distinct advantages over the decimal notation. 
Transformation rules between the various notations commonly used 
are summarized below: 

(1) Binary-to-octal rules are listed in Tables V and VI. 
For example, 5 = 101*B (5 is equivalent to 101 in 
binary notation) , and 7325 becomes 111 Oil 010 101 
in binary. 
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TABLE V 
OCTAL NOTATION 


N 

Binary 

N 2 




0 

000 

0 

1 

001 

1 

2 

010 

4 

3 

Oil 

11 

4 

100 

20 

5 

101 

31 

6 

110 

44 

7 

111 

61 


TABLE VI 

OCTAL MULTIPLICATION AND DIVISION 



(2) In decimal— to-octal , 10 = 8*D and 11 = 9*D. For 
example, 147 = 103*D or 31% = 39*D. Values of N 2 
are listed in Table V. 

(3) Multiplication and division is simple and is listed 
in Table VI, together with conversion of rational-to 
octal fractions. 












(4) Rapid mental calculations are possible from a knowledge 
of the logarithms listed in Table V. 

For example, log 4 = (1/3) log 2 4 = (2/3) = .53. 

For clarity and conciseness, all following calculations 
are given in octal notation unless otherwise specified. For 
instance, 2/3 = . 67*D = .53. Consider a series of runs with a 
maximum length 10 4 or 3 x 4 = 14 bits. These events are observed 
to cluster with signature data as specified in Table VII. Events 
D are of greatest interest, with moderate emphasis on events A, 

E, C, B, and F, in that sequence. 


TABLE VII 

HYPOTHETICAL SIGNATURES FOR STRATEGIC EVENTS 


Group 

Run Length 

Strategic Value 

Event 

Group 





A 

01 - 10 

.004 

.040 

B 

11 - 20 

.001 

.010 

c 

21 - 40 

. 002 

.040 

D 

41 - 50 

.020 

. 200 

E 

51 - 100 

. 004 

.140 

F 

101 - 200 

.001 

. 100 

G 

201 - 400 

.0003 

. 060 

H 

401 - 1000 

.00004 

. 020 

K 

1001 - 2000 

. 00003 

. 030 

L 

2001 - 4000 

.00002 

. 040 

M 

4001 - 10000 

.00001 

. 040 


EXAMPLE 3 : MAXIMUM ACCESS CODE 

The procedure to develop a Maximum Access Code, as defined 
in the preceding example (Run -Lengths with Clusters) presented 
in Table VIII, is as follows. 

(1) List all letters of the alphabet (N^) in descending 
order of strategic value (N 3 ) assigned to each group. 
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(2) In most cases, the number of events per group (N 2 ) is 
a multiple of 2 N . In the above example, this is true 
for all letters except E. The group E is divided into 
three subgroups of 2-3 = 10 events each. The number 

of events per group is listed also in column where 
B-l = log 2 N 2 . 

(3) Using the same procedure which led to the Access Code 
in Table II, a code of length B 2 is derived from No. 

It is not necessary to write out the code explicitly; 
the set of values B 2 is sufficient to proceed. 

(4) The length of the code in bits is given, therefore, by 
B 3 = Bi + B 2 . In Table IX, the letter of each group 
of letters is listed in descending order of strategic 
value . 

To write down the code explicitly for each letter in descending 
order of strategic value, the following approach is useful. 

(1) A word with 6 bits will start from 00 = 000 000*B 

and proceed to 17 = 001 111*B, if there are 20 letters 
in this group. 

(2) If binary digits are denoted by 8 = 0*B and 9 = 1*B, 
then the 10 letters in group A will begin at 

208 = 010 000 0*B and end at 239 = 010 Oil 1*B. 

(3) For a 10-bit word, two binary digits are needed; for 
example, letter C starts at 4388 = 100 Oil 00*B and 
runs for 20 numbers to 4799 = 100 111 11*B. 

Before embarking on a systematic analysis of figures-of- 
merit for variable length codes, it is instructive to evaluate 
relative advantages intuitively. 

Binary coding requires 14 bits for an alphabet of 10,000 
letters. Thus, while the average length-per-letter is increased 
by only 20 to 30 percent in this example, the bandwidth for 
strategically important data is increased by a ratio of 14/6. 
Improvement of transmission of strategic content by a factor of 
100 is well within the capability of suitably designed coding 
schemes . 
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TABLE VIII 


COMPUTATION OF STRATEGIC CODE 


N 1 

N 2 

N 3 

B 1 

B 2 

B 3 

D 

20 

20 

4 

2 


F 

100 

10 

6 

3 

■ 

G 

200 

6 

7 

4 


EX 

10 

4 

3 

4 

■ 

EY 

10 

4 

3 

4 


EZ 

10 

4 

3 

4 


A 

10 

4 

3 

4 

7 

C 

20 

4 

4 

4 

10 

L 

2000 

4 

12 

4 

16 

M 

4000 

4 

13 

4 

17 

K 

1000 

3 

11 

4 

15 

H 

400 

2 

10 

5 

15 

B 

10 

1 

3 

5 

10 


TABLE IX 

CODING SCHEME IN OCTAL NOTATION FOLLOWED 
BY BINARY 8 = 0*B or 9 = 1*B 


N 1 

N 2 

B 3 

Begin 

End 

D 

20 

6 

00 

17 

A 

10 

7 

208 

239 

E 

30 

7 

248 

379 

B 

10 

10 

4088 

4299 

C 

20 

10 

4388 

4799 

F 

10 

11 

500 

577 

G 

200 

13 

6008 

7439 

H 

400 

15 

74408 

74779 

K 

1000 

15 

75008 

75779 

L 

2000 

16 

760088 

767799 

M 

4000 

17 

77000 

77777 




SEARCH FOR FIGURES -OF-MERIT 


As an initial step towards developing criteria specifying 
the usefulness of a code, a mathematically simple distribution 
of data is assumed. Criteria measuring significant properties 
of the code are then defined, with the aim of developing an under- 
lying theory and of searching for a unified and general treat- 
ment of strategic codes. 

Variable length codes are derived, therefore, by minimizing 
the average number of digits per letter for an alphabet governed 
by the geometrical distribution. By applying the Shannon-Fano- 
Huffman redundancy reduction procedure to an alphabet in which 
successive letters have a geometrically tapered probability of 
occurrence, figures-of -merit of the code can be evaluated. An 
algorithm relating the tapering ratio of the number of letters 
of equal length is then derived and compaction ratios for values 
of practical interest are" computed. 

PROPERTIES OF ASSUMED DATA DISTRIBUTION 

The geometric probability distribution (ref. 1) is a 
single-parameter distribution with a mean m of the distribution. 

It is sometimes convenient to define q = 1/m, p = 1 - q, and 
r = 1/p. The probability of obtaining n successive digits of 
one type is then f (n) = qp n and g(n,sj = q(l - pe ns )~! is the 
corresponding generating function. For example, the probability 
of obtaining the letter 11110 is the conditional probability of 
obtaining four favorable trials p^ followed by one unfavorable 
trial. Experimentally the single parameter m is obtained from 
either 


E (n) = m 

the 

expected value 

S(n) = m 

the 

standard deviation 

D (n) = 2 m f (m) 

the 

mean deviation. 


In a previous report (ref. 1, p. 10), an algorithm of Huffman 
coding for the geometric distribution was developed, namely; 
the condition 


r 


k 


= 2 


or 


- In (1/2 ) 

' In (p) 


gives the minimum value of p; such that each k letters of the 
alphabet have exactly the same length. 

Typical maximum values for m which correspond to a pre- 
assigned k are given in Table X. If m » 1, then In p~ - 1/m 
and k = m In 2 . 
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CRITERIA FOR ASSUMED DATA DISTRIBUTION 

The geometric probability distribution is a useful mathe- 
matical artifice because of its simplicity; however, its justi- 
fication is difficult on the basis of a valid statistical model 
of incoming data samples. The geometrical distribution of a 
dependent variable x as a function of an independent variable 
t is based on an occurence of phenomena of the type 

f (t) = -dx/dt = x/m or x/m = exp (1 - t/m) . 

The constant m will then be the ratio of the mean time of occur- 
rence of an event to the sampling time. It is also useful to 
define the so-called "half-time" of the distribution k = m In 2 , 
that is, the time interval during which x falls to one half its 
initial value. If the resulting distribution X 2 /x^ = (1/2) 
is sampled at regular intervals T, the ratio of any two succes- 
sive samples, such as x^ and X 2 , is: 

r = X 2 /X-L = ( 1/2 ) t/k Q r r k = 1/2 if t = 1. 


Thus, k is the number of samples during the time required for the 
signal probability to change to one half its value. To appraise 
the nature of k, the similarity to concepts leading to the defi- 
nition of the Nyquist sampling rate is noteworthy. If k = 1, a 
large number of short samples remain unresolved, similar in fact 
to the Nyquist frequency, which states that if the sampling fre- 
quency is less than one half the period, the higher frequencies 
have an increasingly significant probability of remaining un- 
resolved. 


INTERPRETATION OF CRITERIA 

The number of samples per half-time k is then a measure of 
the probable change in letter length and, hence, is related to m, 
the number sampled during the mean time between changes. The 
value of m specifies 

(1) The mean time between occurrences in units of sampling 
time; 

(2) The time for changes of 1/e in the probability spectrum; 

(3) The rate of change dx/dt. 

For a discrete distribution t = n and At = 1 with f (t)->f(n) . 

Thus, there are n digits in a letter and f (n) = m -n (l - m) is the 
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corresponding probability density. The resultant geometric dis- 
tribution of mean length m is defined by m samples taken during 
the mean time between changes. 

An alphabet having any k letters of the same length has a 
mean length of m, but the length of each letter is usually either 
shorter or longer than m. The value of m can be interpreted in 
terms of the number of sampling periods necessary to acquire the 
information content for an average letter. Loosely speaking, the 
information stored in each letter is m times as long as the sam- 
pling period. 

To illustrate this, consider two distinct examples. For 
small m, say m = 2, it is reasonable to expect that far greater 
emphasis is placed on short letters than on long letters of the 
alphabet; therefore, only one letter is found at each level. 

For m large, say 20, letters requiring a large number of sampling 
periods do not differ greatly in probability from shorter letters. 
It follows that a far larger number of letters can be used effec- 
tively; many samples are needed, therefore, for each length of 
letter. 

For large m, the number of letters which will have the 
same length can be predicted for a binary alphabet from the 
properties of the tree associated with a variable length code. 

FORMULATION OF ALGORITHM 

At each branching point associated with the tree, the prob- 
ability density falls by a factor b. If r is the ratio of prob- 
ability for successive letters for the simplest case, b = 2; hence 
k = m In 2 . If any k letters have the same length, then r^ = b 
or In b = k In r. The factor k can, thus, be interpreted from 
considerations related to information content and may be defined 
as the number of digits between letters differing in probability 
by a factor of 2, while m is the difference in digits between 
letters differing in probability by a factor of "e". In a con- 
tinuous geometrical distribution, the difference in digits between 
letters differing in probability by a factor of "e" is clearly 
equal to the mean length, while in a distribution with discrete 
values of n this value is approached only if m is large. 

COMPACTION RATIOS 

Table XI gives a comparison of compaction ratios for codes 
based upon an assumed geometrical distribution. Given the 
required accuracy A of coded data, the length L(B) of data coded 
in binary is an integer equal to or greater than -log 2 A. The 
mean length of a letter for the geometric distribution is given 
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in Table X for values of k . If k is restricted to integers, then 
m is a maximum; hence the compaction ratio in Table XI is calcu- 
lated using the most unfavorable case. 

A scale of 2 b binary counter can be used to give logarith- 
mic compaction by measuring on the highest level in operation. 
This scale is equivalent to a geometric distribution with a ratio 
of r = 2. If r differs significantly from this ratio, Huffman 
coding may shorten or lengthen the average length of a letter , 
depending on the correlation between assumed and actual distri- 
bution of data. 


TABLE X 

ALPHABETS WITH k LETTERS OF EQUAL LENGTH 
CALCULATED FROM THE EXPECTED VALUE m OF 
THE ASSUMED GEOMETRICAL DISTRIBUTION* 


m 

q 

P 

r 

k 

2.0 

0.40 

0.40 

2.0 

1 

3.3 

0.24 

0.54 

1.3 

2 

o 

t 

m 

0.1 

0.63 

1.2 

3 

24.0 

0.04 

0.74 

1.04 

16 


*See Bibliography (No. 8) 


TABLE XI 

COMPACTION RATIO FOR DATA FROM ASSUMED GEOMETRICAL 
DISTRIBUTION WITH k LETTERS OF EQUAL LENGTH 


Binary Code 

Geometric Distribution 

A 

L (B) 

k = 1 

k = 3 

k = 16 

. 02 

5 

2.4 

1.4 

0.20 

.002 

11 

4.4 

2.5 

0.35 

s 

1 

25 

13.0 

6.5 

1.1 
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CONCLUSIONS 


An orderly coding scheme is established, yielding at a 
glance the strategic importance of each message in at least two 
ways: (1) the shorter the message, the greater its importance ; 

and (2) the higher the initial digit, the shorter its length. 

To formulate a more general approach, an analytical model 
was formulated, which entailed several simplifying assumptions. 
This simplified model was used to explore the usefulness of per- 
formance criteria for alternate codes and coding schemes. 

This investigation was directed principally to explore 
specific examples. Subsequent studies are planned towards the 
development of a unified and generalized treatment of figures- 
of-merit for alternative -coding schemes. 

It appears particularly desirable to define more rigorously 
the terminology used here; for example, strategic value, strategic 
events, and strategic coding. Criteria specifying the strategic 
value of a code deserve rigorous examination. For example, how 
does the strategic value of a code relate to the strategic value 
of data? These and related problems deserve scholarly scrutiny 
and discussion with adequate emphasis on aerospace-oriented 
applications. Finally, an analysis must be provided to strengthen 
the validity of the technique and establish further examples of 
its usefulness. Specifically, it is essential to exhibit those 
situations in which this technique does or does not apply. Work 
towards these objectives is now in progress. 
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TECHNICAL REPORTS: Scientific and 
technical information considered important, 
complete, and a lasting contribution to existing 
knowledge. 

TECHNICAL NOTES: Information less broad 
in scope but nevertheless of importance as a 
contribution to existing knowledge. 

TECHNICAL MEMORANDUMS: 
Information receiving limited distribution 
because of preliminary data, security classifica- 
tion, or other reasons. 

CONTRACTOR REPORTS: Scientific and 
technical information generated under a NASA 
contract or grant and considered an important 
contribution to existing knowledge. 


TECHNICAL TRANSLATIONS: Information 
published in a foreign language considered 
to merit NASA distribution in English. 

SPECIAL PUBLICATIONS: Information 
derived from or of value to NASA activities. 
Publications include conference proceedings, 
monographs, data compilations, handbooks, 
sourcebooks, and special bibliographies. 

TECHNOLOGY UTILIZATION 
PUBLICATIONS: Information on technology 
used by NASA that may be of particular 
interest in commercial and other non-aerospace 
applications. Publications include Tech Briefs, 
Technology Utilization Reports and Notes, 
and Technology Surveys. 


Details on the availability of these publications may be obtained from: 

SCIENTIFIC AND TECHNICAL INFORMATION DIVISION 

NATIONAL AERONAUTICS AND SPACE ADMINISTRATION 

Washington, D.C. 20546 


